Skip to content

Conversation

@mohsaka
Copy link
Contributor

@mohsaka mohsaka commented Oct 31, 2025

Description

Smaller changes from #26445 to make the PR more readable. Mainly cleanup/refactoring and some small additions.

Motivation and Context

To make the larger PR simpler. These changes can be standalone.

Impact

Test Plan

Ran updated TestAnalyzer to make sure analysis changes didn't break anything.
Ran TestTableFunctionInvocation to make sure push down table functions aren't broken.

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
  • If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== NO RELEASE NOTE ==

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Oct 31, 2025
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Oct 31, 2025

Reviewer's Guide

This PR refactors the table function planning and analysis path by introducing properOutputs and copartitioningLists in TableFunctionNode with immutable collections, overhauls pass-through and required column specifications, updates parser/analyzer/planner to consume the new structures, applies minor planner and rewriter enhancements, and adjusts tests to align with renamed functions and updated behaviors.

Class diagram for updated TableFunctionNode and related classes

classDiagram
class TableFunctionNode {
  -String name
  -Map<String, Argument> arguments
  -List<VariableReferenceExpression> properOutputs
  -List<PlanNode> sources
  -List<TableArgumentProperties> tableArgumentProperties
  -List<List<String>> copartitioningLists
  -TableFunctionHandle handle
  +List<VariableReferenceExpression> getOutputVariables()
  +List<VariableReferenceExpression> getProperOutputs()
  +List<List<String>> getCopartitioningLists()
}
class TableArgumentProperties {
  -String argumentName
  -boolean rowSemantics
  -boolean pruneWhenEmpty
  -PassThroughSpecification passThroughSpecification
  -List<VariableReferenceExpression> requiredColumns
  -Optional<DataOrganizationSpecification> specification
  +String getArgumentName()
  +PassThroughSpecification getPassThroughSpecification()
  +List<VariableReferenceExpression> getRequiredColumns()
  +Optional<DataOrganizationSpecification> getSpecification()
}
class PassThroughSpecification {
  -boolean declaredAsPassThrough
  -List<PassThroughColumn> columns
  +boolean isDeclaredAsPassThrough()
  +List<PassThroughColumn> getColumns()
}
class PassThroughColumn {
  -VariableReferenceExpression variable
  -boolean isPartitioningColumn
  +VariableReferenceExpression getVariable()
  +boolean isPartitioningColumn()
}
TableFunctionNode "1" *-- "*" TableArgumentProperties
TableArgumentProperties "1" *-- "1" PassThroughSpecification
PassThroughSpecification "1" *-- "*" PassThroughColumn
PassThroughColumn "1" *-- "1" VariableReferenceExpression
Loading

Class diagram for Field class after removal of newUnqualified method

classDiagram
class Field {
  -Optional<NodeLocation> nodeLocation
  -Optional<QualifiedName> relationAlias
  -Optional<String> name
  -Type type
  -boolean aliased
  -Optional<QualifiedName> originTable
  -Optional<String> originColumnName
  -boolean hidden
  +Optional<NodeLocation> getNodeLocation()
  // ... other methods
}
Loading

Class diagram for QueryPlanner class after visibility change

classDiagram
class QueryPlanner {
  +QueryPlanner(Analysis analysis, VariableAllocator variableAllocator, ...) // now public
  // ... other methods
}
Loading

Class diagram for SimplePlanRewriter context with new getNodeRewriter method

classDiagram
class SimplePlanRewriter {
  +C get()
  +SimplePlanRewriter<C> getNodeRewriter()
  // ... other methods
}
Loading

Class diagram for LocalQueryRunner with updated createCatalog method

classDiagram
class LocalQueryRunner {
  +void createCatalog(String catalogName, String connectorName, Map<String, String> properties)
  // ... other methods
}
Loading

Class diagram for DescriptorField construction in AstBuilder

classDiagram
class DescriptorField {
  +DescriptorField(NodeLocation location, Identifier identifier, Optional<Type> type)
}
Loading

Class diagram for StatementAnalyzer verifyRequiredColumns logic change

classDiagram
class StatementAnalyzer {
  -void verifyRequiredColumns(TableFunctionInvocation node, Map<String, List<Integer>> requiredColumns)
  // ... logic now uses getVisibleFieldCount instead of getAllFieldCount
}
Loading

File-Level Changes

Change Details Files
Refactor TableFunctionNode to support properOutputs, copartitioningLists, and immutability
  • Rename outputVariables to properOutputs and add copartitioningLists field
  • Copy fields into ImmutableList/ImmutableMap and adjust constructors
  • Update getOutputVariables to merge properOutputs with pass-through columns
  • Introduce getProperOutputs and getCopartitioningLists methods
  • Adapt replaceChildren and assignStatsEquivalentPlanNode to include copartitioningLists
TableFunctionNode.java
Enhance table argument specification with PassThroughSpecification and requiredColumns
  • Add argumentName, replace boolean passThroughColumns with PassThroughSpecification and a requiredColumns list
  • Introduce nested classes PassThroughSpecification and PassThroughColumn with validation
  • Adjust getters in TableArgumentProperties
TableFunctionNode.java
Integrate new TVF structures into parser, analyzer, and planner
  • AstBuilder: allow nullable type in DescriptorField
  • StatementAnalyzer: validate required column indices against visibleFieldCount
  • RelationPlanner.visitTableFunctionInvocation and UnaliasSymbolReferences: pass copartitioningLists
AstBuilder.java
StatementAnalyzer.java
RelationPlanner.java
UnaliasSymbolReferences.java
Apply minor planner and rewriter enhancements
  • Make QueryPlanner public and add toSymbolReference helper
  • Add getNodeRewriter to SimplePlanRewriter
  • Implement LocalQueryRunner.createCatalog instead of throwing
QueryPlanner.java
SimplePlanRewriter.java
LocalQueryRunner.java
Update tests for renamed functions and behaviors
  • Use FUNCTION_NAME constants in TestingTableFunctions and add DifferentArgumentTypesFunction
  • Rename two_arguments_function to two_scalar_arguments_function in TestAnalyzer and adjust error offsets
  • Assert failure for hidden column requirement instead of success
TestingTableFunctions.java
TestAnalyzer.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@mohsaka mohsaka marked this pull request as ready for review October 31, 2025 23:44
@prestodb-ci prestodb-ci requested review from a team, Dilli-Babu-Godari and ShahimSharafudeen and removed request for a team October 31, 2025 23:44
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Add explicit requireNonNull checks for copartitioningLists and its nested lists in the TableFunctionNode constructor to guard against null inputs and avoid potential NPEs.
  • Consider renaming PassThroughColumn.getOutputVariables to a singular form like getOutputVariable to better reflect that it returns a single variable and reduce naming confusion.
  • Review the decision to make QueryPlanner.coerce, toSymbolReferences, and toSymbolReference public—if they are only needed for testing or internal use, consider annotating them with @VisibleForTesting or keeping them non-public.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Add explicit requireNonNull checks for copartitioningLists and its nested lists in the TableFunctionNode constructor to guard against null inputs and avoid potential NPEs.
- Consider renaming PassThroughColumn.getOutputVariables to a singular form like getOutputVariable to better reflect that it returns a single variable and reduce naming confusion.
- Review the decision to make QueryPlanner.coerce, toSymbolReferences, and toSymbolReference public—if they are only needed for testing or internal use, consider annotating them with @VisibleForTesting or keeping them non-public.

## Individual Comments

### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java:78-85` </location>
<code_context>
-        this.outputVariables = requireNonNull(outputVariables, "outputVariables is null");
-        this.sources = requireNonNull(sources, "sources is null");
-        this.tableArgumentProperties = requireNonNull(tableArgumentProperties, "tableArgumentProperties is null");
+        this.arguments = ImmutableMap.copyOf(arguments);
+        this.outputVariables = ImmutableList.copyOf(outputVariables);
+        this.sources = ImmutableList.copyOf(sources);
+        this.tableArgumentProperties = ImmutableList.copyOf(tableArgumentProperties);
+        this.copartitioningLists = copartitioningLists.stream()
+                .map(ImmutableList::copyOf)
+                .collect(toImmutableList());
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Immutable collections are used for constructor arguments, but copartitioningLists is not null-checked.

Add a requireNonNull check for copartitioningLists to prevent possible NullPointerExceptions and maintain consistency with other constructor arguments.

```suggestion
        this.arguments = ImmutableMap.copyOf(arguments);
        this.outputVariables = ImmutableList.copyOf(outputVariables);
        this.sources = ImmutableList.copyOf(sources);
        this.tableArgumentProperties = ImmutableList.copyOf(tableArgumentProperties);
        this.copartitioningLists = requireNonNull(copartitioningLists, "copartitioningLists is null").stream()
                .map(ImmutableList::copyOf)
                .collect(toImmutableList());
        this.handle = requireNonNull(handle, "handle is null");
```
</issue_to_address>

### Comment 2
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java:116` </location>
<code_context>
+        return variables.build();
+    }
+
+    public List<VariableReferenceExpression> getProperOutputs()
     {
         return outputVariables;
</code_context>

<issue_to_address>
**suggestion:** Method name getProperOutputs is ambiguous.

Consider renaming to getOutputVariablesOnly or a more descriptive name, since getOutputVariables is overridden with different logic.

Suggested implementation:

```java
    public List<VariableReferenceExpression> getOutputVariablesOnly()
    {
        return outputVariables;
    }

```

If there are any calls to `getProperOutputs()` elsewhere in this file or in other files, they should be updated to `getOutputVariablesOnly()` to match the new method name.
</issue_to_address>

### Comment 3
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java:265` </location>
<code_context>
+
+    public static class PassThroughColumn
+    {
+        private final VariableReferenceExpression outputVariables;
+        private final boolean isPartitioningColumn;
+
</code_context>

<issue_to_address>
**nitpick:** Field name outputVariables in PassThroughColumn is misleading.

Since outputVariables is a single VariableReferenceExpression, renaming it to outputVariable would improve clarity and maintain naming consistency.
</issue_to_address>

### Comment 4
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java:278` </location>
<code_context>
+        }
+
+        @JsonProperty
+        public VariableReferenceExpression getOutputVariables()
+        {
+            return outputVariables;
</code_context>

<issue_to_address>
**nitpick:** Method getOutputVariables in PassThroughColumn should be getOutputVariable.

Rename the getter to getOutputVariable for consistency with the field name and to distinguish it from the plural method in the parent class.
</issue_to_address>

### Comment 5
<location> `presto-main-base/src/test/java/com/facebook/presto/connector/tvf/TestingTableFunctions.java:539-77` </location>
<code_context>
+    public static class DifferentArgumentTypesFunction
</code_context>

<issue_to_address>
**suggestion (testing):** Missing negative/edge case tests for DifferentArgumentTypesFunction.

Please add tests for error conditions and edge cases, such as missing or invalid arguments, to improve validation coverage for this function.

Suggested implementation:

```java
    public static class DifferentArgumentTypesFunction
            extends AbstractConnectorTableFunction
    {
        public static final String FUNCTION_NAME = "different_arguments_function";
        public DifferentArgumentTypesFunction()
        {
            super(
                    SCHEMA_NAME,
                    FUNCTION_NAME,
                    ImmutableList.of(
                            TableArgumentSpecification.builder()

        // Edge/negative case tests for DifferentArgumentTypesFunction
        @Test(expected = IllegalArgumentException.class)
        public void testMissingRequiredArgument()
        {
            // Attempt to call the function with missing required argument
            // This should throw an IllegalArgumentException or similar
            DifferentArgumentTypesFunction function = new DifferentArgumentTypesFunction();
            // Simulate missing argument (actual invocation may differ based on API)
            function.apply(/* missing required argument */);
        }

        @Test(expected = ClassCastException.class)
        public void testInvalidArgumentType()
        {
            // Attempt to call the function with an argument of the wrong type
            DifferentArgumentTypesFunction function = new DifferentArgumentTypesFunction();
            // Simulate invalid type (actual invocation may differ based on API)
            function.apply("invalid_type_instead_of_expected");
        }

        @Test
        public void testBoundaryValues()
        {
            DifferentArgumentTypesFunction function = new DifferentArgumentTypesFunction();
            // Test with boundary values, e.g., empty table, nulls, etc.
            try {
                function.apply(/* boundary value, e.g., empty table or null */);
            } catch (Exception e) {
                // Assert that the function handles boundary values gracefully
                assertTrue(e instanceof IllegalArgumentException || e instanceof NullPointerException);
            }
        }

```

You may need to adjust the `function.apply(...)` calls to match the actual method signature and argument requirements of `DifferentArgumentTypesFunction`. Also, ensure that the test annotations (`@Test`) are compatible with your test framework (e.g., JUnit).
</issue_to_address>

### Comment 6
<location> `presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestAnalyzer.java:1971` </location>
<code_context>
-        analyze("SELECT * FROM TABLE(system.two_arguments_function(text => 'foo', number => 1))");
+        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function('foo'))");
+        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => 'foo'))");
+        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function('foo', 1))");
+        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => 'foo', number => 1))");

</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding tests for edge cases with null and default arguments for two_scalar_arguments_function.

Please add tests for cases with default values, single arguments, and combinations of null and default values to fully cover input scenarios for two_scalar_arguments_function.

Suggested implementation:

```java
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function('foo'))");
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => 'foo'))");

        // Edge cases: null and default arguments
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(NULL, 1))");
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function('foo', NULL))");
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => NULL, number => 1))");
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => 'foo', number => NULL))");

        // If the function supports default values, test with omitted arguments
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function())"); // both default
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => DEFAULT, number => DEFAULT))");
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(DEFAULT, DEFAULT))");

        // Combinations of null and default
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(NULL, DEFAULT))");
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(DEFAULT, NULL))");

        // Single argument cases (already present, but ensure both positional and named)
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function('foo'))");
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => 'foo'))");
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function('foo', 1))");
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => 'foo', number => 1))");

        // Invalid cases: too many arguments, all null, etc.
        assertFails(TABLE_FUNCTION_INVALID_ARGUMENTS, "line 1:58: Too many arguments. Expected at most 2 arguments, got 3 arguments", "SELECT * FROM TABLE(system.two_scalar_arguments_function(1, 2, 3))");
        assertFails(TABLE_FUNCTION_INVALID_ARGUMENTS, "line 1:58: All arguments are null", "SELECT * FROM TABLE(system.two_scalar_arguments_function(NULL, NULL))");

```

- If `DEFAULT` is not a valid keyword in your SQL dialect, replace it with the appropriate syntax for default arguments.
- If the function does not support omitted arguments or defaults, remove those tests or adjust them to expect failures.
- Ensure that the error messages in `assertFails` match the actual error messages produced by your analyzer for null/default misuse.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@mohsaka mohsaka requested a review from aditi-pandit October 31, 2025 23:46
@aditi-pandit
Copy link
Contributor

@mohsaka, @xin-zhang2 : This code looks alright. Lets fix the sourcery-ai review nitpicks.

@jaystarshot : PTAL as well.

Co-authored-by: kasiafi <[email protected]>
Co-authored-by: Xin Zhang <[email protected]>
@xin-zhang2
Copy link
Contributor

@sourcery-ai review

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/testing/LocalQueryRunner.java:775` </location>
<code_context>
     @Override
     public void createCatalog(String catalogName, String connectorName, Map<String, String> properties)
     {
-        throw new UnsupportedOperationException();
+        nodeManager.addCurrentNodeConnector(new ConnectorId(catalogName));
+        connectorManager.createConnection(catalogName, connectorName, properties);
     }
</code_context>

<issue_to_address>
**issue (bug_risk):** createCatalog now performs connector registration.

Please verify that error handling and rollback are properly implemented to prevent inconsistent state if connectorManager.createConnection fails.
</issue_to_address>

### Comment 2
<location> `presto-main-base/src/test/java/com/facebook/presto/connector/tvf/TestingTableFunctions.java:539-77` </location>
<code_context>
+    public static class DifferentArgumentTypesFunction
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding negative and edge case tests for DifferentArgumentTypesFunction.

Please include tests for invalid argument scenarios, such as missing required arguments, incorrect types, duplicate names, and unexpected order, to validate the function's error handling.

Suggested implementation:

```java
    public static class DifferentArgumentTypesFunction
            extends AbstractConnectorTableFunction
    {
        public static final String FUNCTION_NAME = "different_arguments_function";
        public DifferentArgumentTypesFunction()
        {
            super(
                    SCHEMA_NAME,
                    FUNCTION_NAME,
                    ImmutableList.of(
                            TableArgumentSpecification.builder()
        }

        // Negative and edge case tests for DifferentArgumentTypesFunction
        @Test
        public void testDifferentArgumentTypesFunctionNegativeCases()
        {
            // Missing required argument
            try {
                // Simulate call with missing required argument
                callDifferentArgumentsFunction(/* missing required argument */);
                fail("Expected exception for missing required argument");
            }
            catch (IllegalArgumentException | PrestoException e) {
                // Expected
            }

            // Incorrect argument type
            try {
                // Simulate call with incorrect argument type
                callDifferentArgumentsFunction("string_instead_of_table");
                fail("Expected exception for incorrect argument type");
            }
            catch (IllegalArgumentException | PrestoException e) {
                // Expected
            }

            // Duplicate argument names
            try {
                // Simulate call with duplicate argument names
                callDifferentArgumentsFunctionWithDuplicateNames();
                fail("Expected exception for duplicate argument names");
            }
            catch (IllegalArgumentException | PrestoException e) {
                // Expected
            }

            // Unexpected argument order
            try {
                // Simulate call with unexpected argument order
                callDifferentArgumentsFunctionWithWrongOrder();
                fail("Expected exception for unexpected argument order");
            }
            catch (IllegalArgumentException | PrestoException e) {
                // Expected
            }
        }

        // Helper methods to simulate function calls (implementations depend on your test framework)
        private void callDifferentArgumentsFunction(Object... args)
        {
            // Simulate invocation of DifferentArgumentTypesFunction with given args
            // This should trigger validation logic in the function
        }

        private void callDifferentArgumentsFunctionWithDuplicateNames()
        {
            // Simulate invocation with duplicate argument names
        }

        private void callDifferentArgumentsFunctionWithWrongOrder()
        {
            // Simulate invocation with arguments in unexpected order
        }

```

You will need to implement the helper methods (`callDifferentArgumentsFunction`, `callDifferentArgumentsFunctionWithDuplicateNames`, and `callDifferentArgumentsFunctionWithWrongOrder`) according to your test framework and how you invoke table functions in your tests. Make sure these methods actually trigger the validation logic in `DifferentArgumentTypesFunction`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@xin-zhang2
Copy link
Contributor

@aditi-pandit Addressed all sourcery-ai comments.

Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mohsaka and @xin-zhang2

@jaystarshot : PTAL.

@aditi-pandit
Copy link
Contributor

@xin-zhang2 : Can you rebase the code and push again ? I have seen spark on e2e tests error fixed in new Velox builds.

@jaystarshot
Copy link
Member

Can you merge this into #26445 or clarify exactly what is split? Those 2 have same changes for eg

presto-analyzer/src/main/java/com/facebook/presto/sql/analyzer/Field.java

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants